Improving Statistical Word Alignment with Ensemble Methods
نویسندگان
چکیده
This paper proposes an approach to improve statistical word alignment with ensemble methods. Two ensemble methods are investigated: bagging and cross-validation committees. On these two methods, both weighted voting and unweighted voting are compared under the word alignment task. In addition, we analyze the effect of different sizes of training sets on the bagging method. Experimental results indicate that both bagging and cross-validation committees improve the word alignment results regardless of weighted voting or unweighted voting. Weighted voting performs consistently better than unweighted voting on different sizes of training sets.
منابع مشابه
Ensemble methods for offline handwritten text line recognition
This thesis investigates ensemble methods for offline recognition of English handwritten text lines. Multiple recognisers are automatically generated from a single base recognition system. Combining the output of these multiple recognisers provides the final ensemble result. The underlying recognisers are based on hidden Markov models. One model is built for each character. Based on the lexicon...
متن کامل: Improving Domain-Specific Word Alignment with a General Bilingual Corpus
In conventional word alignment methods, some employ statistical models or statistical measures, which need large-scale bilingual sentencealigned training corpora. Others employ dictionaries to guide alignment selection. However, these methods achieve unsatisfactory alignment results when performing word alignment on a small-scale domain-specific bilingual corpus without terminological lexicons....
متن کاملAn Ensemble Model of Word-based and Character-based Models for Japanese and Chinese Input Method
Since Japanese and Chinese languages have too many characters to be input directly using a standard keyboard, input methods for these languages that enable users to input the characters are required. Recently, input methods based on statistical models have become popular because of their accuracy and ease of maintenance. Most of them adopt word-based models because they utilize word-segmented c...
متن کاملA Probability Model to Improve Word Alignment
Word alignment plays a crucial role in statistical machine translation. Word-aligned corpora have been found to be an excellent source of translation-related knowledge. We present a statistical model for computing the probability of an alignment given a sentence pair. This model allows easy integration of context-specific features. Our experiments show that this model can be an effective tool f...
متن کاملImproving Function Word Alignment with Frequency and Syntactic Information
In statistical word alignment for machine translation, function words usually cause poor aligning performance because they do not have clear correspondence between different languages. This paper proposes a novel approach to improve word alignment by pruning alignments of function words from an existing alignment model with high precision and recall. Based on monolingual and bilingual frequency...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005